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<D 1 ABSTRACT 

m : 

Context, Ongoing and future surveys of variable stars will require new techniques for analysing their light curves as well as tagging 
objects according to their variability class in an automated way. 

Aims. We show the use of principal component analysis (PCA) and Fourier decomposition (FD) method as tools for variable star 
i diagnostics and compare their relative performance in studying the changes in the light curve structures of pulsating Cepheids and in 

■ the classification of variable stars. 

Methods. We have calculated the Fourier parameters of 17,606 light curves of a variety of variables, e.g., RR Lyraes, Cepheids, Mira 
Variables and extrinsic variables for our analysis. We have also performed PCA on the same database of light curves. The inputs to 
the PCA are the 100 values of the magnitudes for each of these 17,606 light curves in the database interpolated between phase to 1. 
O !■ Unlike some previous studies, Fourier coefficients are not used as input to the PCA. 

Results. We show that in general, the first few principal components (PCs) are enough to reconstruct the original light curves com- 
pared to FD method where 2 to 3 times more number of parameters are required to satisfactorily reconstruct the light curves. The 
.£j , computation of required number of Fourier parameters on the average needs 20 times more CPU time than the computation of required 

number of PCs. Therefore, PCA does have some advantage over the FD method in analysing the variable stars in a larger database. 
However in some cases, particularly in finding the resonances in Fundamental mode (FU) Cepheids, the PCA results show no distinct 
advantages over the FD method. We also demonstrate that the PCA technique can be used to classify variables into different variability 

, classes in an automated, unsupervised way, a feature that has immense potential for larger databases of the future. 

Conclusions. 

Key words. Methods: statistical; Methods: data analysis; (Stars:) binaries: Eclipsing; Pulsating variables: RR Lyraes, Cepheids, 
MIRA 

m ■ 

1 . Introduction tensively by various authors for light curve reconstruction, mode 

. , , . ... , discrimination and classification of pulsating stars (Antonello et 

ON . The recent interest on the structure and properties of hghtcurves aJ 1986 , M antegazza & Poretti 1992, Hendry, Tanvir & Kanbur 
. of variable stars has increased a lot because of the arge flow iggg 20Q1 N ^ ^ ^ Mofjkalik & p oretti 2003i 

> : 1L obs ™ lonal . data T from va " able . star P™J«^ , OGLE Jin et al. 2004, Tanvir et al. 2005). However, Fourier decomposi- 
. — i (Optical Gravitational Lensing Experiment), MACHO (Massive .. , .. . f .. , .. e ■ U1 

F TT , . , ,„.„,,,,„, . , „ „ , tion by itself is not perfectly suitable for classification of variable 

" SS'?t H ! C ° b irt, ^- ( o y T™?} Sm ' Vey) a u stars in large databases as the method works for individual stars, 

H ] NS VS (Northern Sky Variability Survey). In addition, new tech- ^ cm fee uged ag a sor for other autornated schemes 

CO , niques for tagging variable objects expected in huge numbers (Kanbuf ^ ^ 2QQ2 Kanbuf & Madani 2(m Sam ^ ^ 2QQ9 
from satellite missions like CoRoT (Convection Rotation and 

Planetary Transits), Kepler, and Gaia in a robust and automated The principal component analysis transforms the original 
manner are being explored (Debosscher et al. 2007, Sarro et al. data set of variables by way of an orthogonal transformation to a 
2009). Fourier decomposition technique is a reliable and effi- new set of uncorrelated variables or principal components. The 
cient way of describing the structure of light curves of vari- technique amounts to a straightforward rotation from the origi- 
able stars. Schaltenbrand & Tammann (1971) derived UBV light nal axes t0 the new ones and the principal components are de- 
curve parameters for 323 galactic Cepheids by Fourier analy- rived in decreasing order of importance (Singh et al. 1998). The 
sis. The first systematic use of Fourier technique was made by first few components thus account for most of the variation in 
Simon (1979) for analyzing the observed light variations and the original data (Chatfield & Collins 1980, Murtagh & Heck 
radial velocity variation of Al Velorum. The first-order ampli- 1987 )- The technique has been used for stellar spectral classi- 
fies and phases from the Fourier fits were then compared with ficatlon (Murtagh & Heck 1987, Storrie-Lombardi et al. 1994, 
those obtained from linear adiabatic pulsation models to obtain Sm 8 h < Gulatl & Gu P ta 1998), QSO spectra (Francis et al. 1992) 
the mass of Al Vel. Simon & Lee (1981) made the first attempt and for g^y spectra (Sodre & Cuevas 1994, Connolly et al. 
to reconstruct the light curves of Cepheid variables using the 1995 < Lahav et al - 1996 < Folkes - Lahav & Maddox 1996). There 
Fourier decomposition and to describe the Hertzsprung progres- have been a number of studies on the use of PCA in analyzing 
sion in Cepheid light curves. The method has been applied ex- Cepheid light curves (Kanbur et al. 2002) and RR Lyrae light 

curves (Kanbur & Mariani 2004). In both these studies, the input 

Send offprint requests to: H. P. Singh data to the PCA are the Fourier coefficients rather than the light 
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Table 1. Data selected for the present analysis 



Data 


References 


No. of stars selected 


Data set 


RR Lyrae 








I band (LMC RRab) 


Soszynski, I. et al. (2003) 


5835 


IA 


I band (LMC RRc) 


Soszynski, I. et al. (2003) 


1751 


IB 


Fundamental Cepheids 








I band (LMC) 


Soszynski, I. et al.(2008) 


1804 


IIA 


V band (LMC) 


Martin et al. (1979) 


6 


IIB 


V band (LMC+SMC) 


Moffett et. al(1998) 


13+6 


IIC 


Overtone Cepheids 










ooszynsKi, i. ei ai. ^zuuo^ 


1 998 


TTT 
111 


Mira Variables 








Vband 


http : //archive .princeton. edu/ ~ asas/ 


2878 


IV 


Eclipsing Binary 








I band (LMC) 


Wyrzykowski et al. (2003) 


2681 


VA 


I band (SMC) 


Wyrzykowski et al. (2004) 


1404 


VB 



curves themselves. Nevertheless, it was noted that the PCA was 
able to reproduce the light curves with about half the number of 
parameters (PCs) needed by the Fourier technique. We may re- 
call that in the PCA, the first few PCs are usually examined as 
they contain most of the information about the data. 

The PCA has been applied to the light curves of Cepheid 
variable stars by Kanbur et al. (2002) and RR Lyrae stars by 
Kanbur & Mariani (2004). They concluded that PCA is more ef- 
ficient than the FD method in bringing out changes in the light 
curve structure of these variables. In our opinion, there is no ad- 
vantage in the way the PCA was applied because the Fourier 
coefficients were used as input to the PCA which are themselves 
the information-bearing coefficients of the light curve structure. 
Therefore PCA will not extract any additional information ex- 
cept the dimensionality reduction to a few orders. In the case of 
databases where a variety of variables are present, the method of 
application of PCA on Fourier coefficients is further complicated 
by the fact that the optimal order of fit to different light curves 
is different. When using Fourier coefficients as input to the PCA 
one has to decide where to make a cut in the Fourier fitting or- 
ders. For Fourier decomposition of FU Cepheids one needs pre- 
cise Fourier components up to order ~ 10-15 in explaining the 
Cepheid bump progression whereas RR Lyraes need lesser num- 
ber of Fourier components (~ 2-7 ) to completely describe the 
light curve structure. Also if the phase coverage is not smooth 
then fitting of such light curves with higher order of the fit may 
give rise to wiggles and false bumps which are not associated 
with the true light curve structures. Therefore it is not meaning- 
ful to use Fourier coefficients as input to the PCA when light 
curves of a large number of variable stars having different vari- 
ability classes are to be analysed. We demonstrate this fact with 
the following example: 

Suppose a larger database of stars contains RRab, RRc and 
FU Cepheid variables. The RRc stars are always fitted with lower 
order of the Fourier fit as compared to RRab and FU Cepheids. 
Generally RRc stars need ~ 2-5 order of the fit because of sinu- 
soidal and symmetric nature of their light curves, RRab ~ 3-7 or- 
der of fit because of their asymmetric light curve whereas some 
of the FU Cepheids need to be fitted with ~ 10-15 order of the 
fit to explain the bump progression. Therefore for FU Cepheids, 
if the light curves are fitted with fewer orders, the bump progres- 



sion will not be fitted properly and one will miss the important 
bump feature. On the other hand if all the light curves are fitted 
with higher order of the fit then one is basically fitting the noise 
in the case of RR Lyrae stars which will also be reflected in the 
PCA. 

One of the most important advantages of PCA over the FD 
method is that in PCA, all the light curve data can be processed 
and analysed in one go if all the phased light curve data can 
be made of similar dimensions as we shall demonstrate later, 
whereas in the FD method each light curve has to be fitted with 
optimal order of the fit and analysed individually. This is a very 
time consuming process for large databases. Therefore, the deci- 
sion regarding the cut in the order of the fit is manual and hence 
very cumbersome. Unlike FD, one can decide where to make a 
cut in the PCA order in light curve reconstruction for all the light 
curves simply by looking at the cumulative percentages of vari- 
ance in the data set. The optimal data compression using PCA is 
enormous, a fact that is quite relevant with the larger databases 
of the future. 

PCA has also the advantage of preferential removal of noise 
from the light curve data and isolating the bogus light curves, 
whereas for precise Fourier decomposition, one needs very well- 
defined and accurate light curves free from noisy, scattered data 
points and having a good phase coverage. The most significant 
PCs contain those features which are most strongly correlated in 
many of the light curves. Therefore, the noise which is uncorre- 
cted with any other features will be represented in the less sig- 
nificant components. Also by retaining only the most significant 
PCs to represent the light curves we achieve a data compression 
that preferentially removes the noise. PCA can be used to filter 
out bogus features in the data as it is sensitive to the relative fre- 
quency of occurrence of features in the data set ( Bailer-Jones 
et al. 1998). However, one distinct disadvantage of PCA is that 
addition of a single light curve in the analysis requires the entire 
PCA to be redone. 

In this paper, we show the use of PCA directly on the light 
curve data of more than 17,000 stars (RR Lyraes , Cepheids, 
Eclipsing binaries and Mira variables) taken from the litera- 
ture and different existing databases. We also apply the FD 
method to these light curves to determine the Fourier parameters. 
Denoising should be carried out before the Fourier decomposi- 
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tion if the light curves are noisy. However, the photometric error 
in the light curves in the case of the present selected database 
is very small, i.e., the light curves data have a good photometric 
accuracy( ~ 0.006 - 0.14 mag in the case of OGLE database and 
~ 0.02 - 0.220 mag in the case of ASAS database ). To investi- 
gate the noise in the light curves we have calculated the unit-lag 
auto-correlation function on the residual light curves. The auto- 
correlations are found to be « 1. Therefore no denoising has 
been carried out. However, in some light curves there were out- 
liers present. To remove these outliers, we have used a robust 
multi-pass non linear fitting algorithm in IDL (Interactive Data 
Language). We use light curves (magnitudes at different epochs) 
as input to PCA and compare relative performance of the abil- 
ity of PCA in finding resonances in Cepheids and in the clas- 
sification of different types of variables as compared to the FD 
method. We have, therefore, performed independent automated 
Fourier analysis of all the data sets described in the paper using 
a computer code developed by us. 

Another aim of this paper is to analyze the performance of 
PCA as a fast, automated and unsupervised classification tool for 
variable stars. Since one of the important aspects of this paper 
is to do a preliminary PCA based classification in an unsuper- 
vised way on a larger sets of astronomical data, we explore the 
possibility of its use for future databases. PCA can be used for 
preliminary classification of the variable stars such as classifica- 
tion between pulsating stars and Eclipsing binaries and different 
variability classes. 

We present the Fourier decomposition technique using 
Levenberg-Marquardt algorithm for non-linear least square fit- 
ting (Press et al. 1992) in Sect. 2. We also describe the unit-lag 
auto-correlation function for finding the optimal order of the fit. 
Sect. 3 describes the PCA for dimensionality reduction and light 
curve reconstruction. Sect. 4 describes the results obtained by 
the FD and PCA techniques when applied to study the structure 
of Cepheid light curves. In addition, we compare the relative per- 
formance of FD and PCA for classification of various variability 
classes in the database selected for the present analysis. Finally 
in Sect. 5, we present important conclusions of the study. 

2. Fourier Decomposition technique 

Since the light curves of the selected ensemble of variable stars 
are periodic, they can be written as a sum of cosine and sine 
series: 

N N 

m(t) = Aq + ^ a, cos(;'w(f - to)) + ^ fc, sin(;'«(f - fo)), (1) 
i=i (=i 

where m(f) is the observed magnitude at time f, Aq is the mean 
magnitude, a,-, are the amplitude components of (;' - \)' h har- 
monic, P is the period of the star in days, a>=2n/Pis the angular 
frequency, and N is the order of the fit. fo is the epoch of maxi- 
mum light. Obviously, Eq. (1) has 2N + 1 unknown parameters 
which require at least the same number of data points to solve 
for these parameters. Equivalently, we can write Eq. (1) as 

N 

m(t) = A Q + ^ At cos(/w(f - f ) + 4>d, (2) 

i=i 

where A,- = yjai 1 + b 2 and tan</>, = -fcj/a,-. Since period is 
known from the respective databases, the observation time can 
be folded into phase (<£) as (cf. Ngeow et al. 2003) 

•-^-M^S), (3) 



The value of <£ is from to 1, corresponding to a full cycle of 
pulsation and Int denotes the integer part of the quantity. Hence, 
Eqs. (1) and (2) can be written as (Schaltenbrand & Tammann 
1971) 

N N 

m(t) = Aq + V at cos(2m<b(t)) + V b t sm(2m<t>(t)), (4) 

N 

m(t) = Aq + ^ AiCOs[2ni<b(t) + 0,], (5) 
i=i 

with relative Fourier parameters as 

Ai 

Rn = — ; 0n = <pi - icpi 
Ai 

where i > 1. The combination of coefficients Rn, <f>n where i = 
2,3,4... can be used to describe the progression of light curve 
shape in the case of Cepheids and other variables and can be used 
for variable star classification. In Table 1, we list all the variable 
star light curve data that has been subjected to the analysis. In the 
case of the data taken from the OGLE database (Soszyhski et al. 
(2003, 2008) and Wyrzykowski et al. (2003, 2004), the number 
of stars seems to be more than the actual number presented in 
the database. This is because of the fact that we have not tried 
to remove the overlapping stars in different OGLE fields as this 
will not affect our analysis. In the case of data from Martin et al. 
(1979) , the stars with poor phase coverage have been left out. 

The estimation of optimal number of terms to be used in 
the Fourier decomposition of the individual light curve is not 
straightforward. As has been pointed out by Petersen (1986), if 
N is chosen too small, a larger number of Fourier parameters can 
be calculated from a given observation and the resulting param- 
eters will have systematic deviations from the best estimate. On 
the other hand, if N is chosen too large, we are fitting the noise. 
Following Baart (1982), Petersen (1986) adopted the calculation 
of unit-lag auto-correlation of the sequence of the residuals in 
order to decide the right N so that the residuals consist of noise 
only. It as defined as 

__ gLj (vj-v)(vj + i-v) 
P " 2; = ;(vy-v) 2 

where Vj is the /* residual, v is the average of the residuals and 
j = 1, ....n are the number of data points of a light curve. The 
value of v is basically the residuals of the fitted light curve 

N 

v = m(t) - [Aq + ^Aicos(2m<S>(t) + 0,-)] 
i=i 

It should be noted that for the calculation of p we must 
choose the ordering of vj given by increasing phase values rather 
than ordering given by the original sequence. A definite trend in 
the residuals will result in a value of p equal to 1, while uncor- 
rected residuals give smaller values of p. In the idealized case 
of residuals of equal magnitude with alternating sign, p will be 
approximately equal to -1. The suitable value of p can be cho- 
sen using Baart's condition. According to this, a value of p > 
\n — I]- 1 12 (where n is the number of observations) is an indica- 
tion that it is likely that a trend is present, whereas a value of p < 
[2(n - 1)]~ 1/2 indicates that it is unlikely that a trend is present. 
Baart therefore used the following auto-correlation cut-off toler- 
ance 

p e = p( Cl< f) = [2(«-i)r 1/2 (6) 
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HV5497.dot, Period=98.9750 



HV821.dat, Period = 127.3140 





0.0 0.2 0.4 0.6 0.8 1.0 
Phose (*) 



HV2883.dot, Period=108.9680 



0.0 0.2 0.4 0.6 0.8 1.0 
Phose (*) 



OGLE- 1 1 08.dat,Period=3.0379 




E 15.5 




0.0 0.2 0.4 0.6 0.8 1.0 
Phose (*) 



0.0 0.2 0.4 0.6 0.8 1.0 
Phose (*) 



o 1 1 

E 12 

1| 

13.10 
13.15 
g 1 13.20 
E 13.25 
13.30 
13.35 

13.9 

o 14.0 
a 

£ 14.1 
14.2 
14.3 
16.60 
16.65 

ST 16.70 

£ 16.75 
16.80 
16.85 
18.6 
18.8 
ST 19.0 
£ 19.2 
19.4 
19.6 
18.7 
18.8 
S 1 18.9 
£ 19.0 
19.1 
19.2 



n yaa-jnfl.3 _ ^ ?46,oo miba 



f \ f 

OGLEOO5859/1 -722809 ibXj^B 



■ * V * \ 

v v 

0GLE-LMC-CEP-I7S4 10.662 FU 




l OCSLE0510 i 42.Q5-68fli44 0.4670 RBob 



-0GLE0506jS^-'68430e 0,3qli7TiRc 



0.0 



0.5 1.0 1.5 
Phase (») 



2.0 



a ,, 
fc 12 

15.55 
15.60 
o 15.65 
E 15.70 
15.75 

12.8 
12.9 
g 13.0 
£ 13.1 
13.2 
13.3 

13.65 
| 13.70 
13.75 



19.0 
19.1 
19.2 



\J XJ 

0GLE0&712,3*-731H4 2t?-80,£B 



V w \ 





0.0 



0.5 1.0 1.5 
Phase (») 



2.0 



Fig. 1. Fitted light curves for fundamental mode long period 
Cepheids from Moffett et al. (1998). 

While computing the Fourier parameters of all the light curve 
data selected for the present analysis we have taken care of the 
fact that Baart's condition is satisfied. The optimal order of the 
fit for RRc, RRab, FU Cepheids (OGLE), First Overtone (FO) 
Cepheids, Eclipsing binaries and Mira variables are 3, 5, 12, 10, 
4 and 4 respectively. The longer period data for FU Cepheids 
from Martin et al. (1979) and Moffett et al. (1998) are fitted with 
fifth order of the fit because of relatively small numbers of data 
points. A typical example of the fitted light curves of all types of 
variables with the optimal order of the fit is shown in Fig. 2. 

All the data sets in Table 1 are finally fitted with the optimal 
order of the fit and the fitted light curves are used to derive the 
Fourier phase and amplitude parameters from the Fourier co- 
efficients. Fig. 1 shows the fitted light curves of FU Cepheids. 
Although the number of data points for the longer period are 
less, the phase coverage is satisfactory to do the Fourier decom- 
position. Although the phase coverage is poor, the fits are reason- 
ably good. The lower right panel shows the example of a short 
period fundamental mode Cepheid from the OGLE-III database 
which has a good phase coverage, xl is the Chi Square per de- 
gree of freedom (v) of the fit. The degree of freedom (v) is the 
number of data points minus the number of parameters of the 
fit. The Fourier decomposition parameters (a,-, b,) for Cepheids 
have been computed based on the optimal order of the fit by the 
calculation of the unit-lag auto-correlation function 

3. PRINCIPAL COMPONENT ANALYSIS 

The principal component analysis transforms the original set of 
p variables by an orthogonal transformation to a new set of un- 
correlated variables or principal components (PCs). It involves 
a simple rotation from the original axes to the new ones result- 
ing in principal components in decreasing order of importance. 
The first few q components (q <K p) usually contain most of the 
variation in the original data (Chatfield & Collins 1980, Murtagh 
& Heck 1987). This feature of the PCA has been used in astro- 



Fig. 2. Fitted light curves of different classes of variables used 
in the analysis obtained with the optimal order of the fit. The 
caption at the top of each panel shows the variable name, period 
and type of variables respectively. We have RR Lyrae variables 
(RRc, RRab), Cepheid variables (Fundamental mode (FU) and 
First Overtone (FO)), Eclipsing binaries (EB) and Mira variables 
(MIRA). 



■ OGLE052525.66-693304.5 ■ Real data 



P = I57.5548d • Interpolated data 




Phase (*) 



Fig. 3. Examples of interpolation of magnitudes for 100 points. 
The upper panel shows the light curve with 100 interpolated data 
for the OGLE longer period Eclipsing binary while the lower 
panel shows the interpolated data of a long period Mira variable 
from the ASAS database. The lighter points denote the interpo- 
lated data while the bigger black dots represent the original data. 
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nomical data analysis primarily for the purpose of reducing the 
dimensionality of the data and as a preprocessor for other au- 
tomated techniques like Artificial Neural Networks (ANN). The 
application of PCA to the light curve analysis of variable stars 
has been limited to a few studies (Hendry et al. 1999, Kanbur et 
al. 2002, 2004, Tanvir et al. 2005). In the following, we briefly 
describe the transformation. 

Let rriij be the p magnitudes corresponding to n light curves. 
Let us define the n X p matrix by X = Xjj , 

rriij - m 
Sj V« 

with 

1 " 

and 

2 1 V" 1 2 

s i = ~ Zj ( m U ~ ™i> > 

i=\ 

where m] is the mean value and Sj is the standard deviation. 
Using such standardization we find the principal components 
from the correlation matrix (cf. Murtagh & Heck 1987) 

Cjk = ^ XijXjk = - 'Yjjnij - mj)(m ik - m)/(SjS k ), (7) 
i=\ n i=\ 

with the axis of maximum variance being the largest eigenvector 
ei associated with the largest eigenvalue A\ of the equation 

Cei = /lid. (8) 

The next (second) axis is to be orthogonal to the first and another 
solution of Eq. (8) gives the second largest eigenvalue A2 and the 
corresponding eigenvector or the principal component e.2- Hence 
the proportion of the total variation accounted by the j' h compo- 
nent is Ajlp, where p is also the sum of the eigenvalues (Singh 
etal. 1998). 

Let us suppose that the first q principal components are 
sufficient to retain the information on the original p variables. 
Therefore, we now have a (p x q) matrix E q of eigenvectors. 
The projection vector Z onto the q principal components can be 
found by 

Z = xE q , (9) 
where x is vector of magnitudes defined by 

Xij sj yn + ~m] — rriij, 

and can be represented by 

x = ZE q T . (10) 

We obtain the final light curve x rec by multiplying x with 
Sj -\fn and adding the mean. Z is a (n x q) matrix and E q T is a 
( q x p) matrix and hence the reconstructed light curve is the 
original (n x p) matrix. 

With the phase (<t>) as epoch for each light curve available 
from Eq. (3), we interpolate and obtain 1 00 magnitudes for phase 
to 1 in steps of 0.01. Therefore, each light curve now consists 
of 100 data points (magnitudes) normalized to unity. The input 
to the PCA are these 100 points of magnitudes for each of the 
light curves. We also emphasize that while applying PCA to the 
phased magnitudes of light curves, Fourier coefficients are not 



Table 2. The first 10 eigenvectors, their percentage of variance 
and the cumulative percentage of variance of 1829 fundamental 
mode Cepheids. The input matrix is an 1829 x 100 array. 



PC Eigenvalue Percentage Cum. Percentage 



1 


41.0424 


41.0424 


41.0424 


2 


22.8331 


22.8331 


63.8755 


3 


11.7668 


11.7668 


75.6423 


4 


5.4564 


5.4564 


81.0987 


5 


3.6225 


3.6225 


84.7212 


6 


2.4477 


2.4477 


87.1689 


7 


1.3398 


1.3398 


88.5087 


8 


0.7918 


0.7918 


89.3005 


9 


0.6435 


0.6435 


89.9440 


10 


0.6395 


0.6395 


90.5835 



used to interpolate the light curves. We have used standard inter- 
polation routines in IDL for generating interpolated magnitudes 
in a light curve. Two such examples of the result of interpolation 
are shown in Fig. 3. The actual data points for the Mira variable 
(lower panel) are 223 while 100 interpolated magnitudes have 
been obtained. 

4. Analysis of light curves 

In the subsequent analysis, we compare the capabilities of FD 
and PCA for structural analysis of Cepheids and classification 
accuracy for different classes of variable stars. 

4.1. Structural Analysis & Classification 

4.1.1. Fundamental mode (FU) Cepheids 

We use the light curve data for 1829 FU classical Cepheids from 
various sources as mentioned in Table 1 (Data set IIA+IIB+IIC). 
The majority of the data used in the analysis are from the OGLE 
database. The Fourier decomposition of all the 1829 Cepheid 
light curves has been independently done by us for the calcu- 
lation of the Fourier decomposition parameters as described in 
Sect. 2. We have seen that all the Cepheid light curves selected in 
the present study give satisfactory light curve shape with no nu- 
merical bumps or wiggles when reconstructed using the Fourier 
parameters. 

PCA is performed on an input matrix consisting of a 1829 x 
100 array corresponding to 100 magnitudes from phase to 1 
for 1829 FU Cepheids. The result of the PCA output is shown in 
Table 2. We see that first 10 PCs contain nearly 90 percent of the 
variance in the data. Fig. 4 shows the reconstruction of four FU 
Cepheid light curves using the first 1, 3, 7 and 10 PCs. 

Kanbur et al. (2002) have tried to explain the resonances us- 
ing the PCA on the Fourier coefficients (a,, bi). But due to the 
relatively smaller number of data points they did not give any 
definite conclusions about some of the resonances suggested by 
Antonello & Morelli (1996) in the period range 1.38 < log P 
< 1.43. By doing the PCA analysis of the same data as used 
by Antonello & Morelli (1996), Kanbur et al. (2002) could not 
find any feature in that period range. Based on the available light 
curves covering a wide range of periods, we have plotted R21, 
R31, 021, 031 versus log P in Fig. 5. It is very evident from the 
plots that there is a definite structural change in the Fourier co- 
efficients at periods log P ~ 1 .0 and 1 .5, the latter being close to 
the period range 1.38 < log P < 1.43 suggested by Antonello & 
Morelli (1996). We see that the Fourier decomposition param- 
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Fig. 4. Reconstruction of FU Cepheid light curves using the first 
1, 3, 7 and 10 principal components. The input matrix is an array 
of 1829 rows (stars) and 100 columns (magnitudes from phase 
to 1). The black dots represent the original 100 interpolated data 
points normalized to unit magnitude. 



Fig. 5. Fourier parameters R21, R31, 4>2i, 03i as a function of log 
(Period) for the 1829 FU Cepheids (Data set IIA+IIB+IIC, Table 
1). The Fourier parameters for the I band stars and V band stars 
are marked with filled circles and filled upper triangles respec- 
tively. 



eters R21 and R^i decrease till log P ~ 1.0, increase thereafter 
till log P ~ 1.5 and after that R21 and R31 fall gradually again 
till logP ~ 2.10. Similarly in the 02i and 03 1 plane, we see a 
sharp discontinuity around log P ~ 1. The sharp and the more 
prominent discontinuity around log P ~ 1 .0 is reflected in both 
021 and 03i plots, whereas the other changes in the light curve 
structures around the period log P ~ 1.5 are visible in all the 
Fourier parameter plots. 

In Fig. 6 we plot the first two PCs and PClxPC2 (PClx2) 
against log P. For PCI, PC2 and PC 1x2, a discontinuity around 
log P = 1.0 is quite visible. PCI, PC2 and PClx2 clearly show 
a change around the period log P ~ 1.5. But the discontinuity 
around log P ~ 1 as revealed by the Fourier parameters 02i and 
03i in Fig. 5 is much more pronounced as compared to the PC 
plots. 

Kanbur et al. (2002), using the PCA analysis on the Fourier 
coefficients, did not find any structure changes in the period 
range 1.38 < log P < 1.43. Using PCA on a larger light curve 
data set we have found that in fact there are structural changes 
around log P ~ 1 and 1 .5 and hence there may exist resonances 
around these periods. While the resonance around the period log 
P ~ 1 is well-known, the first two PCs and PC 1x2 show a change 
in the light curve structure around log P = 1.5. It is difficult to 
pinpoint the exact location of the change in structure because of 
fewer stars in the period around log P ~ 1.5. Model calculations 
are necessary to confirm the existence of this resonance. Further, 
Antonello & Poretti (1996) also used a number of data points of 
the longer period side and found some evidence of a decrease of 
R21 at longer periods around (log P ~ 2). It is difficult to confirm 
the existence of such a resonance from either FD or PCA al- 
though we see some change in trend in the first two PCs around 
this period. Therefore, although there are changes in the light 
curve structures around the periods log P ~ 1.5 and 2.10 days, 



one cannot confirm the existence of resonances around these 
periods. Such information about these resonances are generally 
derived from the combined photometric, spectroscopic observa- 
tions and radiative hydrodynamical model calculations (Kienzle 
et al. 1999). 

4.1.2. First overtone (FO) Cepheids 

The light curves of FO Cepheids show a discontinuity in the 
Fourier phase parameters 02 1 and 03 1 around a period of ~ 3.2 
days. This is shown in Fig. 7 for the OGLE data (Data set III) 
of 1228 FO Cepheids. This feature was interpreted as the sig- 
nature of 2: 1 resonance between the first and fourth overtones 
(Antonello & Poretti 1986). This feature was however not repro- 
duced in the hydrodynamical models and in the Fourier param- 
eters of highly accurate observational radial velocity curves of 
FO Cepheids (Kienzle et al. 1999). By means of hydrodynamical 
models for FO Cepheids, Kienzle et al. (1999) have shown that 
the 3.2 day is not the resonance, the true resonance is at around 
4.5 d and 3.2 d is not a resonance. On the other hand Buchler et 
al. ( 1996) had suggested that for a consistent picture on the evo- 
lutionary Mass-Luminosity relations, the FO Cepheid resonance 
should occur at P = 4.3 days . Therefore, not all such structures 
in the photometric Fourier parameters need to be connected to 
the resonances. 

On the other hand, by analyzing the Fourier coefficients 
of a large number of FO LMC Cepheids in the OGLE III 
database, Soszyhski et al. (2008) found a change in the pho- 
tometric Fourier parameters around a period of ~ 0.35d . The 
short-period discontinuity at 0.35d can be explained by presence 
of the 2:1 resonance between the first and fifth overtones in stars 
with masses of about 2.5 M (Dziembowski & Smolec 2009). 
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Fig. 6. First two PCs as a function of log (Period) for the 1829 
FU Cepheids (Data set IIA+IIB+IIC, Table 1). The Fourier pa- 
rameters for the I band stars and V band stars are marked with 
filled circles and filled upper triangles respectively. 



In Fig. 7, we plot the Fourier parameters R21, fci, R31, <Pn 
for 1228 LMC FO Cepheids (Data set III in Table 1). The opti- 
mal order of the fit to the Fourier method has been found to be 
10. There is a definitive marked structure of discontinuity in the 
Fourier plots at periods around 0.35 and 3.2 days. 

We now try to find out whether our PCA procedure can ex- 
tract the information about the structure changes. We carry out 
the PCA on a 1228 x 100 matrix of 1228 LMC FO Cepheids with 
100 I band magnitudes corresponding to phase to 1 in steps of 
0.01. Fig. 8 shows the plot of first three PCs versus the period. 
A sharp discontinuity around the shorter period end near 0.35 
day is evident in all the PC plots. Also, some change in the light 
curve structure seems to occur near to 4 days for all the PC plots. 
There is no change in the light curve structure around 3.2 days 
in PC2 and PC3 whereas in PCI, there is a change in the light 
curve shape around a period of ~ 3.2 days 

Thus, we see that the Fourier parameters performed better in 
bringing out the structural changes in FU Cepheids while for FO 
Cepheids the performance of FD and PCA techniques is similar. 

4.1.3. Classification 

We now explore the possibility of classification of different 
classes of variable stars on the basis of FD & PCA. We use 
the Fourier decomposition parameter R21 and the first princi- 
pal component PCI to classify all the 17,606 stars of different 
variability classes in Table 1. In Fig. 9 we plot the Fourier pa- 
rameter R21 versus log P. As may be seen, the Mira variables 
form a separate group because of their longer periods and not be- 
cause of separation in R21 . However, in the intermediate period 
range (1 to 100 days), Eclipsing binaries have distinct R21 val- 
ues from other classes of variables. Fig. 10 shows plot of log/?2i 
to demonstrate the complete range of R21 for 4085 Eclipsing bi- 
naries. In the short period range there is a considerable overlap 



Fig. 7. Fourier parameters R21, R31, <p2i, 03 1 as a function of log 
(Period) for 1228 LMC overtone Cepheids (Data set III). 



-0.8 r 




0.6L! , , , , , . 

12 3 4 5 6 
P {in days) 



Fig. 8. First three PCs versus Period for LMC overtone Cepheids 
(Data set III). The change in the light curve shape as shown in 
Fig. 7 are also seen from the PC plots. The input matrix is an 
array of 1228 rows (stars) and 100 columns (magnitudes from 
phase to 1). 
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Fig. 9. The classification based on R21 obtained from the FD Fig. 10. The classification based on log /?2i obtained from the FD 
method. L and S denote the LMC and SMC objects respectively, method. 



between the FO Cepheids and RRc stars. This degeneracy in the 
Fourier parameter R21 in the short period range cannot be lifted 
and the classification accuracy cannot be improved by further 
manipulation. 

We carry out the PC A on a 17606x100 matrix of 17,606 
stars, each star having 100 values of magnitudes in their light 
curves. We have used the first principal component (PCI) as it 
contains the maximum variance in this data set. As in the case 
of FD, the PCA is able to separate the Mira variables and the 
Eclipsing binaries and the separation is more effective in the 
case of PCA (Fig. 1 1). The plot of PCl-log P space also shows 
that although PCI is able to separate the Eclipsing binaries and 
Mira variables, there is some overlap in the regions dominated 
by RR Lyraes and Cepheids. In the next step, we choose only 
the samples of RR Lyraes (RRab & RRc) and Cepheids (FU & 
FO) that could not be separated well by the use of PCA on the 
whole data set. We now run PCA on 10,643 light curves (Data 
set IA+IB+IIA+IIB+IIC+III) of RR Lyraes and Cepheids. The 
result of PCA on this 10643x100 array is shown in Fig. 12. 
It may be noted that PCI is able to separate FU Cepheids and 
RRab stars to a large extent while there is some overlap between 
RRc and FO Cepheids in a narrow period range (0.25-0.5 d). We 
hope to return to this degeneracy problem in a subsequent study 
in which we also intend to increase the sample by adding more 
classes of variables. 



5. Conclusions 

Fourier decomposition is a trusted and much applied technique 
for analyzing the behaviour of light curves of periodic variable 
stars. It is well suited for studying individual light curves as the 
Fourier parameters can be easily determined. However, when 
the purpose is to tag a large number of stars for their variable 
class using photometric data from large surveys, the technique 
becomes slow and cumbersome and each light curve has to be 
fitted individually and then analyzed. The same is true if the aim 
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is to look for resonances in the light curves in an automated way 
for a large class of pulsators. It is, therefore, desirable to look for 
methods that are reliable, automated and unsupervised and can 
be applied to the available light curve data directly. 

Some attempts have been made in the recent past to use the 
well known PCA for the light curve analysis, but the major draw- 
back of these studies was that they required the calculation of the 
Fourier parameters which then went as input to the PCA. This 
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meant that the PCA, which was supposed to replace the Fourier 
decomposition, in fact relied on it. Also for precise and accurate 
determination of Fourier parameters, the light curve should have 
good phase coverage and less noisy data points so that the fit to 
the light curve is good enough to rely on its parameters. But this 
is not the case for each and every light curve data generated from 
the automated surveys. Sometimes there are gaps and/or outliers 
in the data. The fitting of such a light curve will give a wrong 
estimation of the Fourier parameters. 

In this paper we have used the original light curve data for 
computation of the principal components. It involves four simple 
steps: a) to phase every light curve between to 1 with respec- 
tive period in days, b) Interpolation of light curve magnitudes 
in short steps (0.01) between phase to 1 to obtain 100 points 
of magnitude for each light curve, c) Normalize the magnitudes 
between and 1 for each of the light curves and d) to do PCA on 
the normalized magnitudes of 100 points for all the light curves. 

The PCA is then used to analyse the structure of the light 
curves of classical Cepheids and the results compared with those 
obtained from the analysis of the Fourier parameters. In addition, 
the two techniques are compared with their ability to classify 
stars into different variability classes. 

We applied the PCA technique to study the structure of light 
curves of fundamental and first overtone Cepheids. By choosing 
a large data set of a large range of periods we have shown that the 
structure of the fundamental mode Cepheid light curves shows 
significant changes around the periods logP ~ 1 and 1.5. The 
resonance around the period log P ~ 1 is well known. The first 
two PCs also show that the behavior of the light curves changes 
around the period logP ~ 1.5 which is close to the resonance 
suggested by Antonello & Poretti (1996) in the period range 
1.38 < log P < 1.43. There is some evidence of the structural 
change in the light curve shape around the period logP ~ 2.0 
also but this can be confirmed only when longer period data be- 
come available. We find that the Fourier parameters performed 



better in bringing out the discontinuities in FU light curves at 
period around log P ~ 1 . 

For the first overtone LMC Cepheids, we find a discontinu- 
ity at a shorter period of ~ 0.35d. The first few PCs also show 
a clear trend of structural changes of the first overtone Cepheids 
at this short period. For FO Cepheids, the performance of FD 
and PCA is similar in bringing out the structural changes around 
a period of 0.35 day. We have been able to find this feature be- 
cause of the availability of significant number of light curves to- 
wards the shorter period end of the LMC Cepheids in the OGLE 
database. The PCA technique can easily find similar resonances 
in the Galactic and SMC first overtone Cepheids as and when 
there is substantial data available for the short period objects of 
this class. 

We have also demonstrated the ability of PCA and its distinct 
advantage over the FD method in classifying stars into different 
variability classes. Although alternative automated methods for 
variable stars classification exist, the PCA based technique can 
be used as a first step in hierarchical classification scheme be- 
cause of its accuracy and efficiency. 

Data compression ratio using PCA on the direct light curve 
data is enormous, a fact that has great relevance when dealing 
with large databases of the future. Also, we have shown some 
preliminary results of variable star classification for an ensem- 
ble of 17,606 stars selected in the present analysis. In a future 
paper, we will describe the application of the PCA technique 
with a larger, more diverse database by looking at the classifica- 
tion accuracy and errors. 
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